-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Support writing timestamps with timezones with to_sql #22654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support writing timestamps with timezones with to_sql #22654
Conversation
Hello @mroeschke! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #22654 +/- ##
=======================================
Coverage 92.17% 92.17%
=======================================
Files 169 169
Lines 50708 50708
=======================================
Hits 46740 46740
Misses 3968 3968
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! this was long overdue :-)
pandas/tests/io/test_sql.py
Outdated
# GH 9086 | ||
if self.flavor != 'postgresql': | ||
msg = "{} does not support datetime with time zone" | ||
pytest.skip(msg.format(self.flavor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we test assert the behaviour for a database that does not support it? Eg what happens if you write a column with timezone aware data to mysql?
From the sqlalchemy docs, it seems the flag timezone=True
is simply ignored, but how are then the datetime values we send it stored? Does that change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the values send change:
In [67]: idx = pd.date_range("2012-01-01 09:00", periods=3, tz='Europe/Brussels')
In [68]: idx.values.astype('M8[us]').astype(object)
Out[68]:
array([datetime.datetime(2012, 1, 1, 8, 0),
datetime.datetime(2012, 1, 2, 8, 0),
datetime.datetime(2012, 1, 3, 8, 0)], dtype=object)
In [69]: idx.to_pydatetime()
Out[69]:
array([ datetime.datetime(2012, 1, 1, 9, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>),
datetime.datetime(2012, 1, 2, 9, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>),
datetime.datetime(2012, 1, 3, 9, 0, tzinfo=<DstTzInfo 'Europe/Brussels' CET+1:00:00 STD>)], dtype=object)
but not sure if that changes how the values are then stored in the mysql database (since the utc / naive time is still the same), but so this might be worth testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. From earlier tests databases that don't support timestamp with timezone
read back data as naive (don't know if this is local naive or converted to UTC first and then returned naive though). Will add some tests.
pandas/tests/io/test_sql.py
Outdated
def test_date_parsing(self): | ||
# No Parsing | ||
df = sql.read_sql_table("types_test_data", self.conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we rather test that it is not parsed instead of removing it? (but I agree this currently looks like this is not doing much)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. My cursory glance thought it was duplicate of the line below; you're right, I will try to add an assert to this result as well
pandas/tests/io/test_sql.py
Outdated
df.to_sql('test_datetime_tz', self.conn) | ||
|
||
expected = df.copy() | ||
expected['A'] = expected['A'].dt.tz_convert('UTC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that I remember correctly: when reading, and we get datetime objects with an offset (if there is a timestamp with timezone
), we always convert to utc because we cannot really know what is the timezone? (we get a simple 'fixed offset' ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, so we should expect this round trip test to load US/Pacific
and return UTC
@jorisvandenbossche so looks like based on the passing tests, databases without timezone support store those the data as naive local time. |
Codecov Report
@@ Coverage Diff @@
## master #22654 +/- ##
=======================================
Coverage 92.24% 92.24%
=======================================
Files 161 161
Lines 51224 51224
=======================================
Hits 47254 47254
Misses 3970 3970
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.24.0.txt
Outdated
@@ -179,6 +179,7 @@ Other Enhancements | |||
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`) | |||
- :func:`~DataFrame.to_csv`, :func:`~Series.to_csv`, :func:`~DataFrame.to_json`, and :func:`~Series.to_json` now support ``compression='infer'`` to infer compression based on filename extension (:issue:`15008`). | |||
The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`). | |||
- :func:`to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` columns (:issue:`9086`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be
:meth:`DataFrame.to_sql`
So the only significant consequence is described here in the pandas/doc/source/whatsnew/v0.24.0.txt Line 617 in de62788
Roundtrips Before:
Roundtrips After:
So addressing your last comment's bullet points:
|
Yeah given this discussion; it would be wise to highlight this change as its own section in the whatsnew. |
@jorisvandenbossche added a new whatsnew section in API detailing the before and after implications of this change. |
So yesterday I tried to make a small illustrative example showing the changed round-trip, but did it actually work before?
So that is my source of confusion. I thought it worked before (and you seemed to confirm it), so therefore I assumed that there is a change in behaviour. But is that actually the case? |
Hmm interesting. I made a (bad) assumption that the tests covered this case, but I actually don't see a test case where to_sql is tested with timezones. So the behavior you're seeing is probably correct. I'll try to confirm tonight on other dbs as well, but I guess it looks like I fixed a bug too :) |
I ran the same test on postgres @jorisvandenbossche and I got the same result. Therefore you're correct in that there really isn't a change in behavior since previously |
Added a fix for #23510 since it was a pretty easy addition to this PR |
Small issue with the new test I think https://travis-ci.org/pandas-dev/pandas/jobs/451739169#L1986
looks good otherwise. |
Thanks @TomAugspurger. Fixed that test (I don't think index name is expected to roundtrip in that test) |
@@ -1275,6 +1246,9 @@ MultiIndex | |||
I/O | |||
^^^ | |||
|
|||
- Bug in :meth:`to_sql` when writing timezone aware data (``datetime64[ns, tz]`` dtype) would raise a ``TypeError`` (:issue:`9086`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this to the "enhancements" though, I would say that timezones were simply never supported, so it is a nice enhancement that we will now actually support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now looking at the full diff (and not only what changed recently), and see you actually already have that. It's a bit duplicated now, but I am fine with keeping it in both places.
if col.dt.tz is not None: | ||
return TIMESTAMP(timezone=True) | ||
except AttributeError: | ||
# The column is actually a DatetimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for my understanding: where was this attribute error catched before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoop, sorry, again misunderstanding from not looking at the full diff :-)
Lets's finally merge this! @mroeschke thanks for the endurance :-) |
…fixed * upstream/master: (47 commits) CLN: remove values attribute from datetimelike EAs (pandas-dev#23603) DOC/CI: Add linting to rst files, and fix issues (pandas-dev#23381) PERF: Speeds up creation of Period, PeriodArray, with Offset freq (pandas-dev#23589) PERF: define is_all_dates to shortcut inadvertent copy when slicing an IntervalIndex (pandas-dev#23591) TST: Tests and Helpers for Datetime/Period Arrays (pandas-dev#23502) Update description of Index._values/values/ndarray_values (pandas-dev#23507) Fixes to make validate_docstrings.py not generate warnings or unwanted output (pandas-dev#23552) DOC: Added note about groupby excluding Decimal columns by default (pandas-dev#18953) ENH: Support writing timestamps with timezones with to_sql (pandas-dev#22654) CI: Auto-cancel redundant builds (pandas-dev#23523) Preserve EA dtype in DataFrame.stack (pandas-dev#23285) TST: Fix dtype mismatch on 32bit in IntervalTree get_indexer test (pandas-dev#23468) BUG: raise if invalid freq is passed (pandas-dev#23546) remove uses of (ts)?lib.(NaT|iNaT|Timestamp) (pandas-dev#23562) BUG: Fix error message for invalid HTML flavor (pandas-dev#23550) ENH: Support EAs in Series.unstack (pandas-dev#23284) DOC: Updating DataFrame.join docstring (pandas-dev#23471) TST: coverage for skipped tests in io/formats/test_to_html.py (pandas-dev#22888) BUG: Return KeyError for invalid string key (pandas-dev#23540) BUG: DatetimeIndex slicing with boolean Index raises TypeError (pandas-dev#22852) ...
git diff upstream/master -u -- "*.py" | flake8 --diff